A chance-corrected measure of inter-annotator agreement for syntax

نویسنده

  • Arne Skjærholt
چکیده

Following the works of Carletta (1996) and Artstein and Poesio (2008), there is an increasing consensus within the field that in order to properly gauge the reliability of an annotation effort, chance-corrected measures of inter-annotator agreement should be used. With this in mind, it is striking that virtually all evaluations of syntactic annotation efforts use uncorrected parser evaluation metrics such as bracket F1 (for phrase structure) and accuracy scores (for dependencies). In this work we present a chance-corrected metric based on Krippendorff’s α, adapted to the structure of syntactic annotations and applicable both to phrase structure and dependency annotation without any modifications. To evaluate our metric we first present a number of synthetic experiments to better control the sources of noise and gauge the metric’s responses, before finally contrasting the behaviour of our chance-corrected metric with that of uncorrected parser evaluation metrics on real corpora.1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accounting for Chance Agreement in Gesture Elicitation Studies

The level of agreement among participants is a key aspect of gesture elicitation studies, and it is typically quantified by means of agreement rates (AR). We show that this measure is problematic, as it does not account for chance agreement. The problem of chance agreement has been extensively discussed in a range of scientific fields in the context of inter-rater reliability studies. We review...

متن کامل

Evaluating Inter-Annotator Agreement on Historical Spelling Normalization

This paper deals with means of evaluating inter-annotator agreement for a normalization task. This task differs from common annotation tasks in two important aspects: (i) the class of labels (the normalized wordforms) is open, and (ii) annotations can match to different degrees. We propose a new method to measure inter-annotator agreement for the normalization task. It integrates common chancec...

متن کامل

Influence of Text Type and Text Length on Anaphoric Annotation

We report the results of a study that investigates the agreement of anaphoric annotations. The study focuses on the influence of the factors text length and text type on a corpus of scientific articles and newspaper texts. In order to measure inter-annotator agreement we compare existing approaches and we propose to measure each step of the annotation process separately instead of measuring the...

متن کامل

Enhancing the Arabic Treebank: a Collaborative Effort toward New Annotation Guidelines

The Arabic Treebank team at the Linguistic Data Consortium has significantly revised and enhanced its annotation guidelines and procedure over the past year. Improvements were made to both the morphological and syntactic annotation guidelines, and annotators were trained in the new guidelines, focusing on areas of low inter-annotator agreement. The revised guidelines are now being applied in an...

متن کامل

An Agreement Measure for Determining Inter-Annotator Reliability of Human Judgements on Affective Text

An affective text may be judged to belong to multiple affect categories as it may evoke different affects with varying degree of intensity. For affect classification of text, it is often required to annotate text corpus with affect categories. This task is often performed by a number of human judges. This paper presents a new agreement measure inspired by Kappa coefficient to compute inter-anno...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014